Overview

Dataset statistics

Number of variables10
Number of observations1599
Missing cells0
Missing cells (%)0.0%
Duplicate rows220
Duplicate rows (%)13.8%
Total size in memory125.0 KiB
Average record size in memory80.1 B

Variable types

Numeric10

Alerts

Dataset has 220 (13.8%) duplicate rowsDuplicates
fixed acidity is highly correlated with citric acid and 1 other fieldsHigh correlation
volatile acidity is highly correlated with citric acidHigh correlation
citric acid is highly correlated with fixed acidity and 1 other fieldsHigh correlation
density is highly correlated with fixed acidityHigh correlation
fixed acidity is highly correlated with citric acid and 1 other fieldsHigh correlation
volatile acidity is highly correlated with citric acidHigh correlation
citric acid is highly correlated with fixed acidity and 1 other fieldsHigh correlation
density is highly correlated with fixed acidityHigh correlation
fixed acidity is highly correlated with citric acid and 2 other fieldsHigh correlation
citric acid is highly correlated with fixed acidity and 2 other fieldsHigh correlation
residual sugar is highly correlated with densityHigh correlation
chlorides is highly correlated with citric acid and 1 other fieldsHigh correlation
density is highly correlated with fixed acidity and 2 other fieldsHigh correlation
sulphates is highly correlated with citric acid and 1 other fieldsHigh correlation
alcohol is highly correlated with fixed acidity and 1 other fieldsHigh correlation
citric acid has 132 (8.3%) zeros Zeros
quality has 63 (3.9%) zeros Zeros

Reproduction

Analysis started2022-06-20 22:13:23.975453
Analysis finished2022-06-20 22:13:38.163695
Duration14.19 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

fixed acidity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct96
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.319637273
Minimum4.6
Maximum15.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum4.6
5-th percentile6.1
Q17.1
median7.9
Q39.2
95-th percentile11.8
Maximum15.9
Range11.3
Interquartile range (IQR)2.1

Descriptive statistics

Standard deviation1.741096318
Coefficient of variation (CV)0.2092755082
Kurtosis1.132143398
Mean8.319637273
Median Absolute Deviation (MAD)1
Skewness0.9827514413
Sum13303.1
Variance3.031416389
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.267
 
4.2%
7.157
 
3.6%
7.853
 
3.3%
7.552
 
3.3%
750
 
3.1%
7.749
 
3.1%
7.646
 
2.9%
6.846
 
2.9%
8.245
 
2.8%
7.444
 
2.8%
Other values (86)1090
68.2%
ValueCountFrequency (%)
4.61
 
0.1%
4.71
 
0.1%
4.91
 
0.1%
56
0.4%
5.14
 
0.3%
5.26
0.4%
5.34
 
0.3%
5.45
 
0.3%
5.51
 
0.1%
5.614
0.9%
ValueCountFrequency (%)
15.91
0.1%
15.62
0.1%
15.52
0.1%
152
0.1%
14.31
0.1%
141
0.1%
13.81
0.1%
13.72
0.1%
13.51
0.1%
13.41
0.1%

volatile acidity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct143
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5278205128
Minimum0.12
Maximum1.58
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.12
5-th percentile0.27
Q10.39
median0.52
Q30.64
95-th percentile0.84
Maximum1.58
Range1.46
Interquartile range (IQR)0.25

Descriptive statistics

Standard deviation0.1790597042
Coefficient of variation (CV)0.3392435493
Kurtosis1.22554225
Mean0.5278205128
Median Absolute Deviation (MAD)0.12
Skewness0.6715925724
Sum843.985
Variance0.03206237765
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.647
 
2.9%
0.546
 
2.9%
0.4343
 
2.7%
0.5939
 
2.4%
0.3638
 
2.4%
0.5838
 
2.4%
0.437
 
2.3%
0.4935
 
2.2%
0.3835
 
2.2%
0.3935
 
2.2%
Other values (133)1206
75.4%
ValueCountFrequency (%)
0.123
 
0.2%
0.162
 
0.1%
0.1810
0.6%
0.192
 
0.1%
0.23
 
0.2%
0.216
0.4%
0.226
0.4%
0.235
 
0.3%
0.2413
0.8%
0.257
0.4%
ValueCountFrequency (%)
1.581
 
0.1%
1.332
0.1%
1.241
 
0.1%
1.1851
 
0.1%
1.181
 
0.1%
1.131
 
0.1%
1.1151
 
0.1%
1.091
 
0.1%
1.071
 
0.1%
1.043
0.2%

citric acid
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct80
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2709756098
Minimum0
Maximum1
Zeros132
Zeros (%)8.3%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.09
median0.26
Q30.42
95-th percentile0.6
Maximum1
Range1
Interquartile range (IQR)0.33

Descriptive statistics

Standard deviation0.1948011374
Coefficient of variation (CV)0.7188880858
Kurtosis-0.7889975154
Mean0.2709756098
Median Absolute Deviation (MAD)0.17
Skewness0.3183372953
Sum433.29
Variance0.03794748313
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0132
 
8.3%
0.4968
 
4.3%
0.2451
 
3.2%
0.0250
 
3.1%
0.2638
 
2.4%
0.135
 
2.2%
0.2133
 
2.1%
0.0133
 
2.1%
0.0833
 
2.1%
0.3232
 
2.0%
Other values (70)1094
68.4%
ValueCountFrequency (%)
0132
8.3%
0.0133
 
2.1%
0.0250
 
3.1%
0.0330
 
1.9%
0.0429
 
1.8%
0.0520
 
1.3%
0.0624
 
1.5%
0.0722
 
1.4%
0.0833
 
2.1%
0.0930
 
1.9%
ValueCountFrequency (%)
11
 
0.1%
0.791
 
0.1%
0.781
 
0.1%
0.763
0.2%
0.751
 
0.1%
0.744
0.3%
0.733
0.2%
0.721
 
0.1%
0.711
 
0.1%
0.72
0.1%

residual sugar
Real number (ℝ≥0)

HIGH CORRELATION

Distinct91
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.538805503
Minimum0.9
Maximum15.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.9
5-th percentile1.59
Q11.9
median2.2
Q32.6
95-th percentile5.1
Maximum15.5
Range14.6
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation1.40992806
Coefficient of variation (CV)0.5553509545
Kurtosis28.61759542
Mean2.538805503
Median Absolute Deviation (MAD)0.3
Skewness4.540655426
Sum4059.55
Variance1.987897133
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2156
 
9.8%
2.2131
 
8.2%
1.8129
 
8.1%
2.1128
 
8.0%
1.9117
 
7.3%
2.3109
 
6.8%
2.486
 
5.4%
2.584
 
5.3%
2.679
 
4.9%
1.776
 
4.8%
Other values (81)504
31.5%
ValueCountFrequency (%)
0.92
 
0.1%
1.28
 
0.5%
1.35
 
0.3%
1.435
 
2.2%
1.530
 
1.9%
1.658
3.6%
1.652
 
0.1%
1.776
4.8%
1.752
 
0.1%
1.8129
8.1%
ValueCountFrequency (%)
15.51
0.1%
15.42
0.1%
13.91
0.1%
13.82
0.1%
13.41
0.1%
12.91
0.1%
112
0.1%
10.71
0.1%
91
0.1%
8.91
0.1%

chlorides
Real number (ℝ≥0)

HIGH CORRELATION

Distinct153
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08746654159
Minimum0.012
Maximum0.611
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.012
5-th percentile0.054
Q10.07
median0.079
Q30.09
95-th percentile0.1261
Maximum0.611
Range0.599
Interquartile range (IQR)0.02

Descriptive statistics

Standard deviation0.04706530201
Coefficient of variation (CV)0.5380949236
Kurtosis41.71578725
Mean0.08746654159
Median Absolute Deviation (MAD)0.01
Skewness5.680346572
Sum139.859
Variance0.002215142653
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0866
 
4.1%
0.07455
 
3.4%
0.07651
 
3.2%
0.07851
 
3.2%
0.08449
 
3.1%
0.07747
 
2.9%
0.07147
 
2.9%
0.08246
 
2.9%
0.07545
 
2.8%
0.07943
 
2.7%
Other values (143)1099
68.7%
ValueCountFrequency (%)
0.0122
 
0.1%
0.0341
 
0.1%
0.0382
 
0.1%
0.0394
0.3%
0.0414
0.3%
0.0423
0.2%
0.0431
 
0.1%
0.0445
0.3%
0.0454
0.3%
0.0464
0.3%
ValueCountFrequency (%)
0.6111
 
0.1%
0.611
 
0.1%
0.4671
 
0.1%
0.4641
 
0.1%
0.4221
 
0.1%
0.4153
0.2%
0.4142
0.1%
0.4131
 
0.1%
0.4031
 
0.1%
0.4011
 
0.1%

total sulfur dioxide
Real number (ℝ≥0)

Distinct144
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.46779237
Minimum6
Maximum289
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum6
5-th percentile11
Q122
median38
Q362
95-th percentile112.1
Maximum289
Range283
Interquartile range (IQR)40

Descriptive statistics

Standard deviation32.89532448
Coefficient of variation (CV)0.7079166623
Kurtosis3.809824488
Mean46.46779237
Median Absolute Deviation (MAD)18
Skewness1.515531258
Sum74302
Variance1082.102373
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2843
 
2.7%
2436
 
2.3%
1835
 
2.2%
1535
 
2.2%
2334
 
2.1%
1433
 
2.1%
2033
 
2.1%
3132
 
2.0%
3831
 
1.9%
2730
 
1.9%
Other values (134)1257
78.6%
ValueCountFrequency (%)
63
 
0.2%
74
 
0.3%
814
 
0.9%
914
 
0.9%
1027
1.7%
1126
1.6%
1229
1.8%
1328
1.8%
1433
2.1%
1535
2.2%
ValueCountFrequency (%)
2891
0.1%
2781
0.1%
1651
0.1%
1601
0.1%
1551
0.1%
1531
0.1%
1521
0.1%
1512
0.1%
1491
0.1%
1482
0.1%

density
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct436
Distinct (%)27.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9967466792
Minimum0.99007
Maximum1.00369
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.99007
5-th percentile0.993598
Q10.9956
median0.99675
Q30.997835
95-th percentile1
Maximum1.00369
Range0.01362
Interquartile range (IQR)0.002235

Descriptive statistics

Standard deviation0.001887333954
Coefficient of variation (CV)0.001893494098
Kurtosis0.9340790655
Mean0.9967466792
Median Absolute Deviation (MAD)0.00113
Skewness0.07128766295
Sum1593.79794
Variance3.562029453 × 10-6
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.997236
 
2.3%
0.996835
 
2.2%
0.997635
 
2.2%
0.99829
 
1.8%
0.996228
 
1.8%
0.997826
 
1.6%
0.996425
 
1.6%
0.999424
 
1.5%
0.99724
 
1.5%
0.998223
 
1.4%
Other values (426)1314
82.2%
ValueCountFrequency (%)
0.990072
0.1%
0.99021
0.1%
0.990642
0.1%
0.99081
0.1%
0.990841
0.1%
0.99121
0.1%
0.99151
0.1%
0.991541
0.1%
0.991571
0.1%
0.99162
0.1%
ValueCountFrequency (%)
1.003692
0.1%
1.00321
 
0.1%
1.003153
0.2%
1.002891
 
0.1%
1.00262
0.1%
1.002422
0.1%
1.00222
0.1%
1.00212
0.1%
1.00181
 
0.1%
1.00152
0.1%

sulphates
Real number (ℝ≥0)

HIGH CORRELATION

Distinct96
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.658148843
Minimum0.33
Maximum2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.33
5-th percentile0.47
Q10.55
median0.62
Q30.73
95-th percentile0.93
Maximum2
Range1.67
Interquartile range (IQR)0.18

Descriptive statistics

Standard deviation0.1695069796
Coefficient of variation (CV)0.2575511321
Kurtosis11.72025073
Mean0.658148843
Median Absolute Deviation (MAD)0.08
Skewness2.428672354
Sum1052.38
Variance0.02873261613
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.669
 
4.3%
0.5868
 
4.3%
0.5468
 
4.3%
0.6261
 
3.8%
0.5660
 
3.8%
0.5755
 
3.4%
0.5951
 
3.2%
0.5351
 
3.2%
0.5550
 
3.1%
0.6348
 
3.0%
Other values (86)1018
63.7%
ValueCountFrequency (%)
0.331
 
0.1%
0.372
 
0.1%
0.396
 
0.4%
0.44
 
0.3%
0.425
 
0.3%
0.438
0.5%
0.4416
1.0%
0.4512
0.8%
0.4618
1.1%
0.4719
1.2%
ValueCountFrequency (%)
21
 
0.1%
1.981
 
0.1%
1.952
0.1%
1.621
 
0.1%
1.611
 
0.1%
1.591
 
0.1%
1.561
 
0.1%
1.363
0.2%
1.341
 
0.1%
1.331
 
0.1%

alcohol
Real number (ℝ≥0)

HIGH CORRELATION

Distinct65
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.42298311
Minimum8.4
Maximum14.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum8.4
5-th percentile9.2
Q19.5
median10.2
Q311.1
95-th percentile12.5
Maximum14.9
Range6.5
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation1.065667582
Coefficient of variation (CV)0.1022420904
Kurtosis0.2000293113
Mean10.42298311
Median Absolute Deviation (MAD)0.7
Skewness0.8608288069
Sum16666.35
Variance1.135647395
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.5139
 
8.7%
9.4103
 
6.4%
9.878
 
4.9%
9.272
 
4.5%
1067
 
4.2%
10.567
 
4.2%
1159
 
3.7%
9.659
 
3.7%
9.359
 
3.7%
9.754
 
3.4%
Other values (55)842
52.7%
ValueCountFrequency (%)
8.42
 
0.1%
8.51
 
0.1%
8.72
 
0.1%
8.82
 
0.1%
930
1.9%
9.051
 
0.1%
9.123
 
1.4%
9.272
4.5%
9.2333333331
 
0.1%
9.251
 
0.1%
ValueCountFrequency (%)
14.91
 
0.1%
147
0.4%
13.64
0.3%
13.566666671
 
0.1%
13.51
 
0.1%
13.43
0.2%
13.33
0.2%
13.21
 
0.1%
13.12
 
0.1%
136
0.4%

quality
Real number (ℝ≥0)

ZEROS

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.096310194
Minimum0
Maximum2
Zeros63
Zeros (%)3.9%
Negative0
Negative (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum2
Range2
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4073543492
Coefficient of variation (CV)0.3715685136
Kurtosis2.3744303
Mean1.096310194
Median Absolute Deviation (MAD)0
Skewness0.7040665651
Sum1753
Variance0.1659375658
MonotonicityNot monotonic
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
11319
82.5%
2217
 
13.6%
063
 
3.9%
ValueCountFrequency (%)
063
 
3.9%
11319
82.5%
2217
 
13.6%
ValueCountFrequency (%)
2217
 
13.6%
11319
82.5%
063
 
3.9%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridestotal sulfur dioxidedensitysulphatesalcoholquality
07.40.700.001.90.07634.00.99780.569.41
17.80.880.002.60.09867.00.99680.689.81
27.80.760.042.30.09254.00.99700.659.81
311.20.280.561.90.07560.00.99800.589.81
47.40.700.001.90.07634.00.99780.569.41
57.40.660.001.80.07540.00.99780.569.41
67.90.600.061.60.06959.00.99640.469.41
77.30.650.001.20.06521.00.99460.4710.02
86.70.580.081.80.09765.00.99590.549.21
97.50.500.366.10.071102.00.99780.8010.51

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridestotal sulfur dioxidedensitysulphatesalcoholquality
15896.90.6850.002.50.10537.00.996600.5710.61
15905.60.3100.7813.90.07492.00.996770.4810.51
15917.70.6900.491.80.115112.00.996800.719.31
15929.40.5000.343.60.08214.00.998700.5210.71
15936.80.5600.221.80.07424.00.994380.8211.21
15947.20.4800.075.50.08918.00.996840.6811.22
15957.60.9500.032.00.09020.00.995900.569.61
15965.00.3800.011.60.04860.00.990840.7514.01
159710.10.2800.461.80.05013.00.997400.7910.21
15987.00.4000.323.60.06129.00.994160.4911.32

Duplicate rows

Most frequently occurring

fixed acidityvolatile aciditycitric acidresidual sugarchloridestotal sulfur dioxidedensitysulphatesalcoholquality# duplicates
226.70.4600.241.70.07734.00.994800.6010.614
527.20.3600.462.10.07444.00.995340.8511.024
637.20.6950.132.00.07620.00.995460.5410.114
817.50.5100.021.70.08431.00.995380.5410.514
56.00.5000.001.40.05726.00.994480.459.513
126.40.6400.211.80.08131.00.996890.669.813
397.00.6500.022.10.06625.00.997200.679.513
407.00.6900.072.50.09121.00.995720.6011.313
607.20.6300.001.90.09738.00.996750.589.013
1047.80.6000.262.00.080131.00.996220.529.913